AITopics | statistical method

Collaborating Authors

statistical method

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Beyond One-Size-Fits-All: Neural Networks for Differentially Private Tabular Data Synthesis

Chen, Kai, Gong, Chen, Wang, Tianhao

arXiv.org Artificial IntelligenceNov-19-2025

In differentially private (DP) tabular data synthesis, the consensus is that statistical models are better than neural network (NN)-based methods. However, we argue that this conclusion is incomplete and overlooks the challenge of densely correlated datasets, where intricate dependencies can overwhelm statistical models. In such complex scenarios, neural networks are more suitable due to their capacity to fit complex distributions by learning directly from samples. Despite this potential, existing NN-based algorithms still suffer from significant limitations. We therefore propose MargNet, incorporating successful algorithmic designs of statistical models into neural networks. MargNet applies an adaptive marginal selection strategy and trains the neural networks to generate data that conforms to the selected marginals. On sparsely correlated datasets, our approach achieves utility close to the best statistical method while offering an average 7$\times$ speedup over it. More importantly, on densely correlated datasets, MargNet establishes a new state-of-the-art, reducing fidelity error by up to 26\% compared to the previous best. We release our code on GitHub.\footnote{https://github.com/KaiChen9909/margnet}

artificial intelligence, dataset, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2511.13893

Genre: Research Report > New Finding (0.93)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

A Statistical Method for Attack-Agnostic Adversarial Attack Detection with Compressive Sensing Comparison

Wimalasuriya, Chinthana, Tragoudas, Spyros

arXiv.org Artificial IntelligenceOct-6-2025

Adversarial attacks present a significant threat to modern machine learning systems. Y et, existing detection methods often lack the ability to detect unseen attacks or detect different attack types with a high level of accuracy. In this work, we propose a statistical approach that establishes a detection baseline before a neural network's deployment, enabling effective real-time adversarial detection. We generate a metric of adversarial presence by comparing the behavior of a compressed/uncompressed neural network pair. Our method has been tested against state-of-the-art techniques, and it achieves near-perfect detection across a wide range of attack types. Moreover, it significantly reduces false positives, making it both reliable and practical for real-world applications.

artificial intelligence, machine learning, vector, (19 more...)

arXiv.org Artificial Intelligence

2510.02707

Country: North America > United States > Illinois (0.14)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Pulse Shape Discrimination Algorithms: Survey and Benchmark

Liu, Haoran, Zhan, Yihan, Liu, Mingzhe, Liu, Yanhua, Li, Peng, Zuo, Zhuo, Liu, Bingqi, Liu, Runxi

arXiv.org Artificial IntelligenceAug-6-2025

This review presents a comprehensive survey and benchmark of pulse shape discrimination (PSD) algorithms for radiation detection, classifying nearly sixty methods into statistical (time-domain, frequency-domain, neural network-based) and prior-knowledge (machine learning, deep learning) paradigms. We implement and evaluate all algorithms on two standardized datasets: an unlabeled set from a 241Am-9Be source and a time-of-flight labeled set from a 238Pu-9Be source, using metrics including Figure of Merit (FOM), F1-score, ROC-AUC, and inter-method correlations. Our analysis reveals that deep learning models, particularly Multi-Layer Perceptrons (MLPs) and hybrid approaches combining statistical features with neural regression, often outperform traditional methods. We discuss architectural suitabilities, the limitations of FOM, alternative evaluation metrics, and performance across energy thresholds. Accompanying this work, we release an open-source toolbox in Python and MATLAB, along with the datasets, to promote reproducibility and advance PSD research.

artificial intelligence, discrimination, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2508.0275

Country:

Europe (0.45)
Asia > China (0.28)
North America > United States (0.28)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Health & Medicine (0.67)
Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

MoCap-Impute: A Comprehensive Benchmark and Comparative Analysis of Imputation Methods for IMU-based Motion Capture Data

Bekhit, Mahmoud, Salah, Ahmad, Alrawahi, Ahmed Salim, Attia, Tarek, Ali, Ahmed, Eldesokey, Esraa, Fathalla, Ahmed

arXiv.org Artificial IntelligenceJul-15-2025

Motion capture (MoCap) data from wearable Inertial Measurement Units (IMUs) is vital for applications in sports science, but its utility is often compromised by missing data. Despite numerous imputation techniques, a systematic performance evaluation for IMU-derived MoCap time-series data is lacking. We address this gap by conducting a comprehensive comparative analysis of statistical, machine learning, and deep learning imputation methods. Our evaluation considers three distinct contexts: univariate time-series, multivariate across subjects, and multivariate across kinematic angles. To facilitate this benchmark, we introduce the first publicly available MoCap dataset designed specifically for imputation, featuring data from 53 karate practitioners. We simulate three controlled missingness mechanisms: missing completely at random (MCAR), block missingness, and a novel value-dependent pattern at signal transition points. Our experiments, conducted on 39 kinematic variables across all subjects, reveal that multivariate imputation frameworks consistently outperform univariate approaches, particularly for complex missingness. For instance, multivariate methods achieve up to a 50% mean absolute error reduction (MAE from 10.8 to 5.8) compared to univariate techniques for transition point missingness. Advanced models like Generative Adversarial Imputation Networks (GAIN) and Iterative Imputers demonstrate the highest accuracy in these challenging scenarios. This work provides a critical baseline for future research and offers practical recommendations for improving the integrity and robustness of Mo-Cap data analysis.

data quality, imputation, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2507.10334

Country:

Africa > Middle East > Egypt (0.68)
Asia > Middle East (0.46)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry:

Leisure & Entertainment > Sports (1.00)
Health & Medicine > Therapeutic Area > Neurology (0.67)

Technology:

Information Technology > Data Science > Data Quality (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Large Language Model-Based Agents for Automated Research Reproducibility: An Exploratory Study in Alzheimer's Disease

Dobbins, Nic, Xiong, Christelle, Lan, Kristine, Yetisgen, Meliha

arXiv.org Artificial IntelligenceJun-2-2025

Objective: To demonstrate the capabilities of Large Language Models (LLMs) as autonomous agents to reproduce findings of published research studies using the same or similar dataset. Materials and Methods: We used the "Quick Access" dataset of the National Alzheimer's Coordinating Center (NACC). We identified highly cited published research manuscripts using NACC data and selected five studies that appeared reproducible using this dataset alone. Using GPT-4o, we created a simulated research team of LLM-based autonomous agents tasked with writing and executing code to dynamically reproduce the findings of each study, given only study Abstracts, Methods sections, and data dictionary descriptions of the dataset. Results: We extracted 35 key findings described in the Abstracts across 5 Alzheimer's studies. On average, LLM agents approximately reproduced 53.2% of findings per study. Numeric values and range-based findings often differed between studies and agents. The agents also applied statistical methods or parameters that varied from the originals, though overall trends and significance were sometimes similar. Discussion: In some cases, LLM-based agents replicated research techniques and findings. In others, they failed due to implementation flaws or missing methodological detail. These discrepancies show the current limits of LLMs in fully automating reproducibility assessments. Still, this early investigation highlights the potential of structured agent-based systems to provide scalable evaluation of scientific rigor. Conclusion: This exploratory work illustrates both the promise and limitations of LLMs as autonomous agents for automating reproducibility in biomedical research.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2505.23852

Country: North America > United States (1.00)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Machine Learning for Cyber-Attack Identification from Traffic Flows

Zhou, Yujing, Jacquet, Marc L., Dawit, Robel, Fabre, Skyler, Sarawat, Dev, Khan, Faheem, Newell, Madison, Liu, Yongxin, Liu, Dahai, Chen, Hongyun, Wang, Jian, Wang, Huihui

arXiv.org Artificial IntelligenceMay-6-2025

This paper presents our simulation of cyber-attacks and detection strategies on the traffic control system in Daytona Beach, FL. using Raspberry Pi virtual machines and the OPNSense firewall, along with traffic dynamics from SUMO and exploitation via the Metasploit framework. We try to answer the research questions: are we able to identify cyber attacks by only analyzing traffic flow patterns. In this research, the cyber attacks are focused particularly when lights are randomly turned all green or red at busy intersections by adversarial attackers. Despite challenges stemming from imbalanced data and overlapping traffic patterns, our best model shows 85\% accuracy when detecting intrusions purely using traffic flow statistics. Key indicators for successful detection included occupancy, jam length, and halting durations.

accuracy, data mining, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2505.01489

Country: North America > United States > Florida > Volusia County > Daytona Beach (0.26)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Networks (1.00)
(3 more...)

Add feedback

Statistical Uncertainty Quantification for Aggregate Performance Metrics in Machine Learning Benchmarks

Longjohn, Rachel, Gopalan, Giri, Casleton, Emily

arXiv.org Machine LearningJan-7-2025

Modern artificial intelligence is supported by machine learning models (e.g., foundation models) that are pretrained on a massive data corpus and then adapted to solve a variety of downstream tasks. To summarize performance across multiple tasks, evaluation metrics are often aggregated into a summary metric, e.g., average accuracy across 10 question-answering tasks. When aggregating evaluation metrics, it is useful to incorporate uncertainty in the aggregate metric in order to gain a more realistic understanding of model performance. Our objective in this work is to demonstrate how statistical methodology can be used for quantifying uncertainty in metrics that have been aggregated across multiple tasks. The methods we emphasize are bootstrapping, Bayesian hierarchical (i.e., multilevel) modeling, and the visualization of task weightings that consider standard errors. These techniques reveal insights such as the dominance of a specific model for certain types of tasks despite an overall poor performance. We use a popular ML benchmark, the Visual Task Adaptation Benchmark (VTAB), to demonstrate the usefulness of our approaches.

accuracy, confidence interval, task performance, (17 more...)

arXiv.org Machine Learning

2501.04234

Country:

North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)
North America > United States > New York (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > California > Orange County > Irvine (0.04)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.49)

Add feedback

A Vectorization Method Induced By Maximal Margin Classification For Persistent Diagrams

Wu, An, Pan, Yu, Zhou, Fuqi, Yan, Jinghui, Liu, Chuanlu

arXiv.org Artificial IntelligenceJul-30-2024

Persistent homology is an effective method for extracting topological information, represented as persistent diagrams, of spatial structure data. Hence it is well-suited for the study of protein structures. Attempts to incorporate Persistent homology in machine learning methods of protein function prediction have resulted in several techniques for vectorizing persistent diagrams. However, current vectorization methods are excessively artificial and cannot ensure the effective utilization of information or the rationality of the methods. To address this problem, we propose a more geometrical vectorization method of persistent diagrams based on maximal margin classification for Banach space, and additionaly propose a framework that utilizes topological data analysis to identify proteins with specific functions. We evaluated our vectorization method using a binary classification task on proteins and compared it with the statistical methods that exhibit the best performance among thirteen commonly used vectorization methods. The experimental results indicate that our approach surpasses the statistical methods in both robustness and precision.

classification, persistent diagram, protein, (14 more...)

arXiv.org Artificial Intelligence

2407.21298

Country:

Asia > China > Beijing > Beijing (0.05)
Asia > China > Zhejiang Province > Hangzhou (0.04)
North America > United States (0.04)
Asia > Japan > Honshū > Chūgoku > Shimane Prefecture > Matsue (0.04)

Genre: Research Report (0.64)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.87)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.30)

Add feedback

Unveiling the Unseen: Exploring Whitebox Membership Inference through the Lens of Explainability

Li, Chenxi, Kumar, Abhinav, Guo, Zhen, Hou, Jie, Tourani, Reza

arXiv.org Artificial IntelligenceJul-1-2024

The increasing prominence of deep learning applications and reliance on personalized data underscore the urgent need to address privacy vulnerabilities, particularly Membership Inference Attacks (MIAs). Despite numerous MIA studies, significant knowledge gaps persist, particularly regarding the impact of hidden features (in isolation) on attack efficacy and insufficient justification for the root causes of attacks based on raw data features. In this paper, we aim to address these knowledge gaps by first exploring statistical approaches to identify the most informative neurons and quantifying the significance of the hidden activations from the selected neurons on attack accuracy, in isolation and combination. Additionally, we propose an attack-driven explainable framework by integrating the target and attack models to identify the most influential features of raw data that lead to successful membership inference attacks. Our proposed MIA shows an improvement of up to 26% on state-of-the-art MIA.

dataset, neuron, target model, (16 more...)

arXiv.org Artificial Intelligence

2407.01306

Country:

North America > United States > Missouri > St. Louis County > St. Louis (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.68)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Systematic Survey of Text Summarization: From Statistical Methods to Large Language Models

Zhang, Haopeng, Yu, Philip S., Zhang, Jiawei

arXiv.org Artificial IntelligenceJun-17-2024

Text summarization research has undergone several significant transformations with the advent of deep neural networks, pre-trained language models (PLMs), and recent large language models (LLMs). This survey thus provides a comprehensive review of the research progress and evolution in text summarization through the lens of these paradigm shifts. It is organized into two main parts: (1) a detailed overview of datasets, evaluation metrics, and summarization methods before the LLM era, encompassing traditional statistical methods, deep learning approaches, and PLM fine-tuning techniques, and (2) the first detailed examination of recent advancements in benchmarking, modeling, and evaluating summarization in the LLM era. By synthesizing existing literature and presenting a cohesive overview, this survey also discusses research trends, open challenges, and proposes promising research directions in summarization, aiming to guide researchers through the evolving landscape of summarization research.

arxiv preprint arxiv, proceedings, summarization, (11 more...)

arXiv.org Artificial Intelligence

2406.11289

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > New York (0.04)
North America > United States > California > Yolo County > Davis (0.04)
(8 more...)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.45)

Industry:

Health & Medicine (1.00)
Media > News (0.93)
Law (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback